Chapter 18 - Stereo

To buy the book from which this article came (as a paper or as a Kindle book) go here.

Many music traditions betray an interest in the positioning of instruments and vocalists relative to one another; especially so when music is combined with drama. It is therefore appealing to construct electronic recording and reproduction systems which can capture and recreate these antiphonal, spatial effects. More fundamentally, music takes place in spaces, and the acoustic of those spaces is often a vital component of the overall effect. A mass setting by William Byrd would not sound "right" without the long acoustic reverberation of a large church or cathedral. A spatial audio system aims, not only to capture and recreate the positional information of the musicians themselves, but to place them in a convincing acoustic. Synthetic music too aims to create, not only music, but a synthetic environment in which it is "performed".

Spatial Hearing

Figure 1 - Directional hearing cues

Consider the situation shown above, in which an experimental subject is presented with a source of sound located at some distance from the side of the head. The two most important cues the brain uses to determine the direction of a sound are due to the physical nature of sound and its propagation through the atmosphere and around solid objects. We can make two reliable observations:

At high-frequencies, the relative loudness of a sound at the two ears is different. Because the head acts as a baffle, the nearer ear receives a louder signal compared with the remote ear.
At low frequencies, the baffling effect of the head is inoperative, but a delay exists between the sound reaching the near ear and the further ear.

It may be demonstrated that both effects aid the nervous system in its judgement as to the location of a sound source: At high frequencies, the head casts an effective acoustic "shadow" which acts like a low-pass filter and attenuates high frequencies arriving at the far ear, thus enabling the nervous system to make use of interaural intensity differences to determine direction. At low frequencies, sound diffracts and bends around the head to reach the far ear virtually unimpeded. So, in the absence of intensity-type directional cues, the nervous system compares the relative delay of the signals at each ear. This effect is termed interaural delay difference. In the case of steady-state sounds or pure-tones, the low-frequency delay manifests itself as a phase-difference between the signals arriving at either ear. The idea that sound localisation is based upon interaural time differences at low frequencies and interaural intensity differences at high frequencies has been called Duplex theory and it originates with Lord Rayleigh at the turn of the twentieth century.

Binaural Techniques

The simplest, and in some ways the best, stereo system was invented in 1881, when Monsieur Clement Ader placed two microphones about eight inches apart (the average distance between the ears) on stage at the Paris Opera where a concert was being performed. He relayed these signals over telephone lines to two telephone ear pieces at the Paris Exhibition of Electricity. The amazed listeners were able to hear, by holding one ear piece to each ear, a remarkably lifelike impression that they too were sat in the Opera audience. This was the first public demonstration of binaural stereophony, the word binaural being derived from the Latin for two ears.

Figure 2 - Ader's, so-called Theatrephone at the Paris Exhibition of Electricity in a contemporary lithograph

The techniques of binaural stereophony, little different from this original, have been exploited many times in the century since the first demonstration. However, psycho-physicists and audiologists have gradually realised that considerable improvements can be made to the simple spaced microphone system by encapsulating the two microphones in a synthetic head and torso. The illusion is strengthened still more if the artificial "dummy" head is provided with artificial auricles (external ears or pinnae). The binaural stereophonic illusion is improved by the addition of an artificial head and torso and external ears because it is now known that sound interacts with these structures before entering the ear canal. If, in a recording, microphones can be arranged to interact with similar features, the illusion is greatly improved in terms of realism and accuracy when the signals are relayed over headphones. This is because headphones sit right over the ears and thus do not interact with the listener's anatomy on the final playback.

Figure 3 - Contemporary "dummy head" for binaural recordings: note the addition of false pinnae

Binaural audio is theoretically capable of recreating perfect, accurate sound-fields; apparently reproducing sounds from all directions at the ears of the listener and the system only requires two discrete recoding channels, so it is therefore efficient in engineering terms too. However, some important limitations should be noted. Firstly, by making mouldings of experimental subject's pinnae, experiments have consistently shown that subjects are far better at judging the direction of sounds when utilising their castings of their own pinnae than when listening with another person's external ear mouldings. It seems that during childhood experience we learn to listen with our "own ears". This would not be such a depressing limitation were it not for the fact that every person's pinnae are as unique as their finger prints. Secondly, and most damning of all, there appears to be a very real commercial drawback imposed by the system's method of signal presentation over headphones. Music listening is both a shared activity and a process which is shared with other activities, and headphones prevent both.

Crosstalk-cancellation

The desire to re-create spatial sound-fields without headphones has been recognised since the very earliest experiments with stereophony. However, if the signals from a dummy head recording are replayed over two loudspeakers placed in the conventional stereophonic listening arrangement (with loudspeakers arranged at ±30° to the centre), the results are very disappointing. The reason for this is the two unwanted crosstalk signals: the signal emanating from the right loudspeaker which reaches the left ear; and the signal emanating from the left loudspeaker which reaches the right ear. These signals result in a failure to reproduce the correct interaural time delay cues at low frequencies. Several researchers have proposed and constructed systems in which complementary cancelling signals were fed to the speakers to cancel these crosstalk signals (Roland RSS System & Thorn EMI Sensaura). Unfortunately, to work well, the system required that the listener held one, very precise position, a situation which invalidated the convenience and companionship of loudspeaker listening.

Two-Loudspeaker Stereophony

Two loudspeaker stereophony has restricted ambitions compared with binaural stereo. When listening to music on a two-channel loudspeaker stereo audio system, a sound image is spread out in the space between the two loudspeakers. The reproduced image thus has some characteristics in common with the way the same music is heard in real-life - that is, with individual instruments or voices (known as phantom images) each occupying, to a greater or lesser extent, a particular and distinct position in space. However, insofar as this process is concerned with creating and re-creating a sound-event, it is limited in that the image occupies only the space bounded by the loudspeakers. Nevertheless, the system has proved popular and endured for fifty years as the staple presentation of audio. Only in the last twenty or so years have multi-channel systems become a reality, which aim to produce artificial sound-fields which surround or "immerse" the listener.

Two different techniques are used in the production of most stereo recordings. The first is a system that was invented in 1928 by Alan Blumlein - a British genius working for EMI (see box p. 614). This is by far the most commonly employed system and is based on encoding phantom image positions by means of inter-channel amplitude differences. The second system is much rarer, and is based on encoding inter-channel time differences between the stereo channels. Both systems are described below.

Blumlein's (intensity derived) system

Blumlein was well aware of Duplex spatial-hearing theory and gives a good pr�cis in his patent application (1932). He therefore expected that the high-frequency inter-aural intensity cues and low-frequency inter-aural delay cues would be formed differently.

Low frequency directional hearing

Figure 4 - A sound source auditioned in real life

Figure 4 illustrates a real, sound source auditioned in real life. Considering the low-frequency case, the two ears of the listener are spaced distance h apart. The sound source is placed so that its direction is θ° to the straight-ahead position. The sound will travel further to the right ear than to the left. If v is the velocity of sound in air, the time interval between the arrivals of the sound at the two ears will be,

( h sin θ ) / v

Because h is small compared with the distance from the source there will be a phase difference,

Φ = (ω h sin θ ) / v

where ω is 2π times the frequency of the sound wave.

If a recording and reproduction system can be designed which exactly recreates, by means of the correct sound pressures at the ears of the listener, the original time differences of arrival, the listener will experience a virtual sound source at angle θ.

Figure 5 - Phase differences due to inter-channel intensity ratios

It may be demonstrated, by geometrical reasoning, that the phase difference at the ears of a listener seated in relation to loudspeakers disposed as in Figure 5, the phase differences at the ears may be calculated to be,

Φ _δ = [(L - R) / (L + R)] . [(ω h sin ψ) / v ] ................................ (A)

Thereby demonstrating that any given phase shift may be derived at the listener's ears by means of the appropriate ratio of in-phase signals (L and R) fed to the loudspeakers set at 2ψ° apart.^{^[1]} This equation is central to understanding how stereo systems operate. Figure 6 illustrates this diagrammatically.

Figure 6 - How two signals which differ only in magnitude from two loudspeakers combine at the ears to produce signals which differ only in phase. This is the mechanism by which stereo systems produce the illusion they do at frequencies below 1kHz

But how might the sound signals be encoded or captured to create the appropriate inter-channel amplitude ratio? One answer is to "steer" sound sources into a particular position using a ratiometric potentiometer designed progressively to attenuate one channel whilst progressively strengthening the other as the knob is rotated; the input being shared equally between both channels when the knob is in its centre (12 o'clock) position. Such a control is referred to as a panoramic potentiometer or pan-pot for short.

This technique is the norm for the huge majority of stereo recordings both today and for the last 50 years! In this procedure, each instrumentalist or vocalist is close-miked and the result of the mix of all the instruments combined together electrically inside the audio mixer; the apparent position of each instrumentalist within the stereo picture being set via the setting of the pan-pot. Note that all pan-pots encode stereo information by inter-channel intensity differences alone; they can therefore be regarded as a version of Blumlein's intensity-derived stereo system.

For the capturing of real sound-fields something more subtle is required. Clark et al. (1958) describing the commercialisation of Blumlein's EMI stereo system [2] thirty years after Blumlein's original patent was written, show how the sound-field may be sampled so as to recreate the appropriate phase-shifts at the listener's ears. Clark and his team opted for a coincident stereo microphone technique based on crossed figure-of-eight (velocity) microphones.

Figure 7 - The output of a velocity microphone follows a cosine law

Given that the output of a velocity microphone follows a cosine law as shown in Figure 7, the microphone voltages derived from a horizontal crossed pair, placed together, angled 90 degrees apart and inclined so that each pair is placed such that its maximum response is at 45 degrees to the median plane, will be,

E_L = k. sin ( 45^o + θ_t)

E_R = k. sin ( 45^o - θ_t)

where θ_t is the true angle of the recorded sound source from the median plane.

From which it may be derived that,

( E_L - E_R ) / ( E_L+ E_R ) = tan θ_t ............................ (B)

Encode - decode

Equations A and B may be combined together to produce a simple expression for the reproduction of an entire encode-decode chain. It is,

sin θ_a = tan θ_t . sin ψ ............................................ (C)

where θ_a is the apparent angle of reproduced sound.

Figure 8 - θ_t against θ_afrom equation (c). See text.

We can plot (as Clark et al. did) θ_t against θ_a for various values of ψ. This graph is reproduced as Figure 8. The curves represent the perceived angle (y axis) versus the captured angle (x-axis) for an encode-decode system with the signals captured from perpendicular, crossed figure-of-eights replayed over loudspeakers the base angles of which subtend either 60° at the listening position (ψ = 30°), or 90° at the listening position (ψ = 45°).

As you can see, when the loudspeakers are disposed at 30° either side of the listener, as in the classic stereo layout, the captured sound-stage is cramped so that the original, captured 90° is compressed to the reproduced 60°. However, the scaling is fairly linear. Interestingly, the case for � 45� loudspeakers is plotted too in the figure. This illustrates that a 90 degree soundstage may be accurately produced by such a system. In fact, equation C illustrates that a perfect illusion of the original sound-event may be created by the EMI system; at least at frequencies below 700Hz^[3].

HF imaging

Unfortunately neither Blumlein himself, nor the post-war team of Clark, Dutton and Vanderlyn, were able to offer such thorough theoretical analysis for HF imaging. It was clear to them that amplitude differences alone, caused by the shadowing effect of the head, and must account for the perception of direction at high-frequencies^[4]. But, in the computer-less world of 1928, a thorough geometrical analysis of the baffling effect of the head and upper torso would have been a gargantuan task. Their approach as practical engineers was therefore empirical. The question they sought to clarify and answer was: Now that we know what amplitude ratios are required to generate the correct phase-difference cues at LF for the listener, what ratios are required at high-frequencies? As Clark et al. say in their 1958 paper^[5],

As the quotation clearly illustrates, the experiments by Clark et al. led them to discover two points:

It confirmed the theoretical considerations above, and proved them to be correct to a high degree of accuracy for the LF image (±2°).
That - for any given, non-equal inter-channel ratio - a high-frequency phantom image is created in a different location to the low-frequency phantom image. Further, as they note, this phenomenon is well documented by other workers.

Really, this should not come as any great surprise; it would be fortuitous indeed if two entirely different perceptual mechanisms could reveal perfectly similar illusions for the same inter-channel amplitude-ratio. The net result of these findings is that, on a real, wideband music signal the stereo image is "smeared" with the high and low frequencies failing to "map" on top of one another. Figure 11 (left) illustrates this well, where it may be seen that the frequency range around 10kHz produces a much "steeper" curve compared with the curve for low and mid-frequencies.

As the EMI REDD team put it in the manual for their Stereosonic mixing consoles (REDD 1959),

The image in Figure 9 is an attempt to give a visual analogy for this effect. Interestingly, the acoustic effect is analogous to chromatic aberration in a lens, in which the high frequency blue light is refracted differently to the low-frequency red light.

Figure 9 - A visual analogy for the simultaneous creation of a low-frequency stereo-image and a high-frequency stereo-image

Clark and his team's solution to this aberration was simple. As they discovered,

They accomplished this signal manipulation by deriving a sum signal (L + R) and difference signal (L - R) and inserting a low-pass filter into the difference channel. They invented a matrix and filter circuit to accomplish this and they referred to this technique as Stereo Shuffling. Below is an illustration of their practical circuit and its implementation in the difference channel.

Figure 10 - EMI's Stereosonic Shuffler and its implementation in the difference channel of a matrixed stereo signal. Right: amplitude response of the low-pass filter in the difference channel

Loss of the Shuffler

Unfortunately, the EMI Shuffler circuit, party due to a lack of comprehension, and partly due to a sub-optimal implementation, was either omitted or abandoned as part of the standard stereo system and a belief gradually engendered that congruent low-frequency and high-frequency stereo images may be created by simple ratiometric inter-channel ratios. Even the best of all the expensive and exotic mixers of today and yesteryear ignore the findings of EMI and others, so that today (and for sixty years) stereophonic sound has never delivered on the promise its inventor had for it nearly a century ago! Rather compromise and "good enough" have ruled the day. The loss of the EMI Shuffler technique left the stereo system broken - a situation which has effectively lasted until today.

Delay-derived stereophony - the Precedence effect or Law of the first wave-front

Psychological experiments demonstrate that there exists another method of steering a phantom image into positions along the axis of two loudspeakers by inter-channel delay differences rather than intensity differences. Clearly a central phantom image may be derived in which both channels receive identical signals, identically timed. In fact, the situation is indistinguishable from that produced in intensity-derived localisation. Experiments using a variety of sounds demonstrate that, when the inter-channel delay is approximately equal to 1mS, the sound localises at the speaker receiving the earlier of the two signals. The postulated mechanism for this observed phenomenon is an inhibition system in the auditory processing which suppresses directional cues arriving approximately 1mS after the first set of interaural cues. Alternatively termed the law of the first wave front or the precedence effect, it is believed that the auditory system, having been developed over millions of years in the presence of reverberation, prioritises the first set of auditory cues it receives, which it takes to be the direct sound, from later signals, which it assumes are reverberation effects. In between, the two extremes of 0mS delay (where the image is central) and 1mS delay (where the image localises at the loudspeaker producing the earlier signal) a progressive, but confused relationship emerges. Between these two boundary conditions, the auditory system appears partially to fuse the staggered signals and tries to derive reliable directional information. The orthodox conjecture is that inter-channel delay derived stereophony presented via loudspeakers works because of some kind of gradual onset of the precedence effect.

However, delay-derived stereophony isn't a reliable system, and a moment's thought will demonstrate why. If we imagine a wideband sound (composed of many simultaneous sine waves) panned away from the centre by introducing a delay D, those frequency-components with wavelengths which are similar to D . S_s (where S_s is the velocity of sound in air), will recentralise, because the phase difference between them will return to zero. Figure 11 (right hand graph) illustrates the results of a study by Wendt (Blauert 1983) and shows the position of phantom-images derived by inter-channel delay difference. As the graph shows, perceived position depends heavily on the frequency component of the experimental stimulus^[6]. Note that the studies with signals with a wavelength similar to (D . S_s) do show exactly the type of predicted recentralising effects.

Figure 11 - Amplitude derived (left) & delay derived (right) phantom-image positions.

Because inter-channel intensity derived stereo is greatly the superior system (a comparison of Figure 11 left and right graphs demonstrate this admirably), the vast majority of stereo material is produced using inter-channel intensity coding. In short, most records are multi-tracked and electrically panned. Nevertheless, an important application of delay-derived stereo exists in classical music recording with the use of spaced pressure-sensitive (omnidirectional) microphones. This system, and its "cousin", the Decca Tree, create a stereo illusion due to time-of-arrival information collected at spaced microphones. If we look at the practical arrangements, we can see that they are calibrated to give results similar to those illustrated in Fig. 11 (right), from which we deduce that we need at least 1mS delay to give a full-left/ full right impression. With a one metre spacing, and a maximum obliquity of 30 degrees from the centre (which implies the spaced pair is being deployed very close to the performers), the maximum inter-channel delay is given by,

sin 30° . 1/ S_s in mS

where S_s is the speed of sound in air expressed in metres/second.

The result is 1.47mS for the maximum inter-channel delay for an Ss taken as the nominal 340 metres/second. This also explains why the infamous 3:1 rule exists for the layout of omni microphones because, if the microphones are some distance from the performers, the obliquity is limited and therefore there is a commensurate requirement to space the microphones further apart to get the adequate inter-channel delay.

Many beautiful recordings have been made using the spaced-omni and Decca-Tree approach. However, it is only fair to point out that no successful theoretical underpinning has ever been derived for such recordings as the time-of-arrival data is scrambled on replay over loudspeakers. This is illustrated diagrammatically in Figure 12.

Figure 12 - The failure of spaced, omnidirectional microphones to create auditory cues similar to those experienced in the real sound-field. Here the omnidirectional microphones and speakers are replaced by the analogy of windows in a wall.

Once again, we invoke the precedence effect to explain this, so that the later signals (shown bracketed) are suppressed in favour of the earlier. As recording is as much an artistic enterprise as it is an engineering discipline, this lack of engineering rigour is justifiable.

Wavefield synthesis (WFS) - Holographic stereophony

There exists a parallel literature in the history of spatial sound reproduction which aims to recreate the original sound-field at the listening position. These techniques are termed wave field synthesis (WFS). This latter approach runs something like this. The ideal mono reproduction system provides a window from the listening space (typically a small room) into the performance room (a much larger space - like a concert hall). This is illustrated in Fig. 13.

Figure 13 - If mono recordings may be formalized as an acoustic "window" upon the original performance space, why can we not think of stereo as two windows, and 5.1 systems as offering 5 windows?

Extending this argument, stereo appears to provide for two windows into the performance space and multi-channel audio systems (like 5.1, considered in next chapter) are an extension to provide multiple windows; as shown in the diagram. It follows that there must be a point at which enough wall has been replaced by windows that the listener's experience will match that of being in the hall itself, rather than in the listening space. If enough microphones sample the sound-field, and enough loudspeakers are used to recreate it, the listener will experience the sound waves as they were in the original hall. She will therefore be able to turn her head and move around within the space and experience the sound just as she would have been able to do at the original concert. The aim is therefore to recreate the amplitude and the phase of the sound waves as they would have originally been. The analogy is often drawn with holography and an alternative term for wave field synthesis is holographic stereophony. Wave field synthesis relies on Huygens' Principle which states that,

Each point on a primary wave-front can be considered to be a new source of a secondary spherical wave and that a secondary wave-front can be constructed as the envelope of these secondary waves.

Figure 14 illustrates Huygens' principle. In our case, each electro-acoustic channel (microphone-loudspeaker) is one of Huygens' secondary sources as shown in Figure 15.

Figure 14 - Huygens' Principle - Each point on a primary wave-front can be considered to be a new source of a secondary spherical wave and that a secondary wave-front can be constructed as the envelope of these secondary waves

The problem with this argument is that, without an enormous number of channels and loudspeakers, a WFS system can never provide a faithful spatial reconstruction of a sound-field. Sometimes it is assumed that the wave-front reconstruction approach underpins spaced microphone techniques. This is quite wrong and the reason is illustrated in Fig. 12. In the upper part of the diagram, the sound emanating from an instrument is detected by a pair of ears in the original performance space. In the lower part of the diagram, the acoustic signals experienced via two windows are illustrated.

Figure 15 - In wave-field synthesis, each electro-acoustic channel (microphone-loudspeaker) is one of Huygens' secondary sources

If we imagine a sound like a sharp, single shot on a drum (nearly an impulse response), we can derive the sounds which will arrive at the two ears in each case. These are illustrated too. It is immediately obvious that the signals arriving at the ears are completely different in each situation. In fact, the only reason the "two-windows" system works is due to the law of the first wave-front suppressing the later cues. Systematic, mathematical analysis, in which the integral of the results from the secondary wave-fronts are computed and compared with the original wave-front, shows that the only way that holographic stereo can be made to work is to construct a system in which the approaching sound wave-front is sampled at many, many points - as if through a very wide window (Figure 16). Electrically this could be achieved with a "curtain" of microphones producing signals carried in separate channels and produced by separate "curtain" of loudspeakers. Theoretical analysis demonstrates that the spatial sampling (the number of transducers per unit length) must be less than 1 wavelength at the highest reproduced frequency for non-aliased spatial images: in other words, a microphone and loudspeaker every 2 cm! There are also problems with the finite length of the loudspeaker array which cause, so called, truncation artefacts which are analogous to the hard edge of the listening window and to the requirement for windowing in the time-frequency Fourier transform.

Figure 16 - The only way that holographic stereo can be made to work is to construct a system in which the approaching sound wave-front is sampled at many, many points - as if through a very wide window.

Sweet spot

The phenomenon of the law of the first wave-front imposes limitations on intensity-derived stereophony. The preference that our auditory system has for initial auditory cues over subsequent cues causes the illusion of intensity derived stereo to collapse when the listener moves from the listening position at the apex of an equilateral triangle between the loudspeakers (see Figure 5). The construction of a stereo image is said to rely on the listener occupying a small sweet spot where the image is reliably experienced, this limitation is due to the law of the first wave-front.

Stereophony in large halls

Apart from the fact that the majority of records are mixed for domestic surroundings, the purist two-loudspeaker, summing stereophony proposed and engineered by the EMI team fairs badly when reproduction is required in a large auditorium. Inevitably, in a large space, many people will not be located at the ideal listening position or sweet spot. Analysis demonstrates that a listener sat far away from the sweet-spot will experience incorrect - or even paradoxical - phase differences at the ears of the listener so that the image (at certain frequencies) will even appear outside the baseline of the loudspeakers.

On the other side of the Atlantic, in America, the demand for stereo sound came predominantly from the cinema industry. Their starting point was therefore entirely different. Steinberg and Snow's work at Bell Telephone Laboratories (1934) in the early nineteen-thirties was the first serious and systematic investigation into spatial audio in America (their term was Auditory Perspective). Their investigations started with the premise that they wanted to re-create in a large auditorium the spatial positions of actors on a sound-stage. Their equipment was disposed as shown in Figure 17. Their methodology was empirical and was based on an experiment in which a group of listeners in an auditorium with various numbers of loudspeakers marked the perceived positions of a caller in a separate room, in which various numbers of microphones were installed.

Figure 17 - Steinberg and Snow's work on Auditory Perspective

The real position of the caller is marked on the left hand side. The results for a three microphone - three loudspeaker arrangement are shown in the diagram (top). There is some distortion of the depth of the illusion - especially in the centre. Nevertheless the results were deemed acceptable. They then tried a two-channel arrangement and their results are shown in the diagram too. The major distortion here exists in the centre of the soundstage; which is probably the worst place for the image to be distorted. They tried further hybrids of three microphones to two loudspeakers and two microphones to three loudspeakers, in search of a more economic arrangement (remember microphones, amplifiers and loudspeakers were expensive in 1930). None was found, and they concluded that the three-channels was the minimum number to affect a reasonable illusion of width and depth. This work formed the foundation of multi-channel, cinema audio systems which we shall meet next chapter.

An intermediate conclusion

We have arrived at a point in which we can survey past and present stereo sound systems and, unfortunately, the summary is rather a dispiriting one. We have binaural techniques, capable of remarkable realism and engineering parsimony, but which have failed consistently to attract consumer interest or acceptance. We have conventional, two-channel, inter-channel intensity derived stereophony: the most successful, and certainly the hardiest of the systems, but broken within years of its invention and never mended. We have wave field synthesis which has yet to leave the laboratory: and various spaced microphone systems which were either originally conceived for cinema, or labour under a fallacious theoretical model.

All in all, not a pretty picture! So, it is with this depressing litany in mind that the rest of the chapter is devoted to re-visiting Blumlein's stereophony and how it might be improved because, despite its rather drab lack of modernity, intensity-derived stereo remains the format which works in cars, on personal stereos and on hi-fi systems of all types. It has outlived its many challengers - from Quadrophony to 3D-sound, and survived into the digital, on-line age as the format of the vast majority of MPEG files. It is also the format in which the vast majority of the rich back-catalogue of music is preserved. So any benefits which might be teased from this old technology may have benefits for past, as well as for future, recordings.

Improving stereo

Improving Image Sharpness by means of Inter-channel Crosstalk

You will remember that channel-intensity panned stereo recordings (and stereo recordings made with coincident microphones) are all "broken" without modification of the channel intensities with respect to frequency. A visual analogy of the effect was illustrated in Figure 9 which shows how, as the sound image gets further from the centre, it is split into non-coincident high-frequency and low-frequency components. The EMI team who developed the modern, intensity-derived stereo system tackled this problem by inserting a low-pass filter in the difference channel (L - R) and compensating delays into the sum (L + R) channel. They called this circuit, the Shuffler ^[7].

The Shuffler was regarded by the EMI team as central to their Stereosonic system and Shuffler filters were developed for EMI recording engineers as part of the iconic REDD consoles (Figure 18) so that all EMI stereo recordings could be made using these circuits. But these Shufflers gradually fell into disuse with recording engineers, partly out of misunderstanding and ignorance, and partly because of unintentional colouration artefacts which were introduced due to unequal group-delay in the sum and difference channels and which led to comb-filter effects in the frequency-response of the console.

Figure 18 - EMI's iconic REDD.51 Stereosonic console

Contemporary paperwork reveals that the EMI REDD engineers were already aware of compromises in the Shuffler circuit as this amended, equivocation (to the Abbey Road paperwork) indicates,

The hand-written "almost" and "substantially" speak volumes and hint of arguments in which recording engineers had maintained to their REDD counterparts that they preferred working without the Shufflers in circuit; preferring a flat frequency-response to improved stereo imaging. Few contemporary engineers would disagree. But the net result was the loss of a vital part of a stereo system based on amplitude-only stereophony and we are all the poorer for this.

HF crosstalk compensation

In the late nineteen-nineties, I suggested an alternative to the EMI shuffler which implemented the same effect by means of inter-channel crosstalk (Brice 1997, 1998). This signal process caused the same narrowing of the HF image with respect to the LF image in a simple circuit incapable of introducing other frequency response distortion artefacts. Functionally, the FRANCINSTIEN was identical to the EMI Stereo Shuffler but sidestepped the complications and compromises of the EMI implementation. The technique was commercially exploited in the FRANCINSTIEN^[8] range of stereophonic image enhancement systems developed by Perfect Pitch Music Ltd. Commercial units for use in hi-fi systems and recording studios were both produced.

Figure 19 - The FRANCINSTIEN process and a commercial hi-fi version of the network

Recent developments of the FRANCINSTIEN circuit

Time doesn't stand still and experiments have continued with the FRANCINSTIEN matrix technique. This final section therefore reports on recent work in improvements to the crosstalk-based stereo-image correction-filters.

There have been many studies of the illusion of the spatial sound field produced by two loudspeakers, but one of the best, because of its good, clean technique, its reference and corroboration of earlier studies, its understanding of all the mechanisms described, and its useful mathematical models for further work was produced by D.M. Leakey in his PhD thesis and in a paper published by the JASA (Leakey 1959).

Leakey, working contemporaneously with the EMI team, but temporarily unencumbered by commercial constraints, was free to derive and test theoretical models for LF and HF imaging. He studied the apparent position of a sound image for a given inter-channel intensity difference with different types of material grouped into low-frequency stimuli: band-limited noise; filtered speech; and single and dual tones. And high-frequency stimuli: HF band-limited noise; filtered speech, pulses and five component, un-synchronised tones.

Figure 20 - Leakey's experimental set-up.

His results corroborate the findings of all other researchers that - for a similar inter-channel intensity difference - the position of a high-frequency sound and a low-frequency sound are different. His experimental set-up is illustrated in Figure 20. As an example, Leakey found that, for a high-frequency stimulus to seem to come from position R1 (or R7), a channel intensity difference of 12.4dB was required. A low-frequency stimulus however, required a difference of nearly 5dB more to appear to come from the same position. ^[9]

Leakey derived mathematical models to predict the sound image position in a stereo listening arrangement. At LF the image position is given by,

Where α is the perceived angle of the phantom image from the midline, and θ is the offset angle of the loudspeakers from the mid-line, a result which matches EMI's equation (B) with compensation (in the form of tanθ) for reduced stage angle.

At HF, Leakey derived the following, more complicated, law^[10].

Leakey's two models are plotted on the same axes in Fig. 21. Once again, the overall point is simply made, for a given channel intensity difference the HF components of an instrumental or vocal contribution will subtend a greater angle at the listening position than will the LF components.

Figure 21 -HF and LF phantom image positions vs. inter-channel intensity difference on an untreated stereo system

The effect of the EMI Shuffler "plugged into" these two models is illustrated in Fig. 22. The curves show that, despite the benefits over untreated signals, especially in the all-important central region of the stereo image, the Shuffler actually over-compensates the HF image; causing it to fall inside the LF image at the extreme image positions. An identical result is shown with the original FRANCINSTIEN which, deferentially, aped the EMI circuit parameters.

Figure 22 - HF and LF phantom image positions vs. inter-channel intensity difference on stereo system with EMI Shuffler (or FRANCINSTIEN) compensation

Is it possible to engineer a frequency-dependent channel-intensity modification so as to bring the two models closer and effect a better match for the LF and HF image? The answer is, yes, and the effect is illustrated in Fig. 23 where an almost perfect match between LF and HF imaging was obtained by iterative adjustments to the inter-channel intensity ratios at high frequencies.

Figure 23 - HF and LF phantom image positions vs. inter-channel intensity difference stereo system with "Bride of FRANCINSTIEN" compensation

From this information it is a relatively simple matter to recalculate the crosstalk components in the FRANCINSTIEN matrix to effect the same processing. The result is a new filter dubbed, Bride of FRANCINSTIEN.

To buy the book from which this article came (as a paper or as a Kindle book) go here.

References

Blumlein, A. (1933) British Patent 394,325 June 14th
Clark, Dutton and Vanderlyn (1958) The Stereosonic recording and reproducing system: a two-channel system for domestic tape records JAES 6,2, pp102-117
REDD (1959), The EMI Stereosonic Recording Circuits: Sum Difference, Spreader and Shuffler. EMI REDD department REF. RSL.51
Blauert, J. (1983) Spatial Hearing MIT Press
Steinberg, J.C. & Snow, W.B. Auditory Perspective - Physical Factors. Electrical Engineering, Jan. 1934
Brice, R. (1997) Multimedia and Virtual Reality Engineering. Newnes
Brice, R. (1998) Music Engineering. Newnes
Leakey, D.M. Measurements on the Effects of Interchannel Intensity and Time Differences in Two Channel Sound Systems JASA Vol 31, Number 7 July 1959.

Footnotes

[1] Note that, if phase angles are directly proportional to frequency, as they are here, this is equivalent to a time delay.

[2] The system EMI labeled as the Stereosonic system, although it's worth noting that Blumlein never referred to the system as "stereophonic" or "Stereosonic", for him it remained the Binaural sound system.

[3] It also indicates that a classic sine-cosine pan-pot actually encodes information over a ±45° angle, it's just the limitation of the loudspeaker angle which limits the image to ±30°.

[4] Because, at frequencies above about 700Hz (a frequency which has a wavelength approximately equal to the dimensions of the human head), a system based on phase-difference becomes ambiguous due to the fact that there could be an unknown number of whole cycles between the phases of signals received at the ears.

[5] Clark's equation (1) and (2) are equivalent to equation (A) and (C) above.

[6] Wendt in fact used tone-bursts: short, modulated bursts of a sine wave source.

[7] Despite being discussed since the earliest days of stereophony, there remains much confusion about the term Stereo Shuffling. This is not surprising because the term actually refers to two, quite separate and different techniques. Simply put, the earliest use of the term (coined by no less than Blumlein himself), refers to the processing of near-spaced omni' microphone signals so that they reproduce correctly on loudspeakers. This is not the Shuffler referred to here which was invented some thirty years later! Why use the same name? Well the later Shuffler was invented by the EMI team who had worked with Alan Blumlein before he was killed in WW2. Perhaps they sought to honour him in adopting the term which derived from him?

[8] A monstrous acronym for: Frequency-dependent, ANalogue Crosstalk Injection Network for STero Image Enhancement.

[9] Leakey also notes that the standard deviation in his group of listeners was a small: 1dB in each case - demonstrating that the "stereo illusion" is consistent between subjects.

[10] The exponential terms derive from the psychophysical considerations which indicate that at HF the ear is sensitive to the envelope of the signal....... See reference. The constant m is a head-shadow constant.

Links

Help Index

For all support issues, go here.

For Pspatial Audio sales, email: sales@pspatialaudio.com

Apple Certified Developer. Stereo Lab, Aria 51, Aria 20, Head Space, Groove Sleuth, iLOOP and FRANCINSTIEN T-Sym are trademarks of Pspatial Audio. FRANCINSTIEN and Bride of FRANCINSTIEN (BoF) are trademarks of Phaedrus Audio. Macintosh and the Mac logo are trademarks of Apple Computer, Inc.